home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Euroscene 2
/
Euroscene 2.iso
/
USEFUL
/
libs
/
Docs
/
HUFF.doc
< prev
next >
Wrap
Text File
|
1995-11-01
|
10KB
|
222 lines
HUFF
A dynamic huffman cruncher/decruncher
Version 0.62
Copyright 1992 by M.Zimmermann
License/Disclaimer
------------------
This library may be freely distributed with the XPK compression package,
as long as it is kept in its original, complete, and unmodified form. It
may not be distributed by itself or in a commercial package of any kind
without my permission.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
or FITNESS FOR A PARTICULAR PURPOSE.
About Huffmans
--------------
The idea of a huffman crunch is as follows: often used bytes (ie 8 bit
codes) get a shorter code than 8 bits, fi 5 bits. So everytime one of these
bytes occurs in the source file I save (in this example) 3 bits in the dest
file. To find out the optimum codes for each byte there is a simple method:
A tree is to be constructed so that the path from the root to each leaf
reflects the optimum (ie huffman) code. Unfortunately most computers (the
Amiga, too) are byte-oriented, which means a rather complex handling of
codes that are not a multiple of 8 bits. This results in pretty slow
compression & decompression. So this means that the xpkHUFF.library
probably won't be used for normal circumstances, but, as Dominik stated, it
may serve well as an example library.
There are three different huffman crunch algorithms known:
· static compression/decompression
· dynamic compression/decompression
· adaptive compression/decompression
What are the differences?
The static huffman uses a fix table to represent each byte of the source.
This, of course, makes sense only, if the structure of the data to be
crunched is known. In this case (for instance crunching an english text) a
fix table of codes is embedded in the code. Crunching other data than what
the table was generated for will probably result in worse compression or
even expansion.
This is what a dynamic huffman is avoiding: it first creates a
statistics table reflecting the frequency every byte occurs with and
generates a special tree/table for this case, so the dynamic huffman does a
good compression for this special data.
But there is something that can be improved, anyway: imagine, there is a
data block which contains many 'a's in it's first part and many 'z's in the
last part.... The dynamic huffman would represent the 'a's and 'z's with
short codes, of course. But it probably would be even better if the
crunch/decrunch tree would reflect the *current* data beeing processed
instead of the whole block, thus in resulting shorter codes for 'a' and 'z',
depending of the position in the data block. This is what an adaptive
huffman deals with: it creates the tree while crunching, without doing any
statistics or tree creation before crunching. It permanently updates it's
internal huffman tree. Therefore it doesn't have to include the information
about the decrunch tree in the crunched data.
Final words about huffmans ...
------------------------------
A stand-alone huffman will never achieve crunch results as fine as those
reached with most other crunchers, for these will not only regard the number
of occurances for each byte (as huffman does), but sequels of byte, too.
This means: If you create all permutations of a datablock, the huffman
crunched will always have the same length. Others won't, as they are
depending on the order of the crunched data, too.
Description
-----------
The library 'xpkHUFF.library' implements a dynamic huffman crunch
algorithm, even though the adaptive might result in slightly better crunch
results. However, this is more complex to implement and I'm using a maximum
buffer size of 64K, so this is a little bit like an adaptive huffman for
large files.
If I should have lots of spare time I will probably implement an adaptive
huffman crunch algorithm. This new library will be called xpkHUFF, too, and
new xpkHUFF.libraries will always handle output generated by earlier
versions.
The xpkHUFF.library supports a totally unsafe (but a little bit better
than simple eor :-) encryption. Please note that crunch/decrunch speeds
decrease when encryption is used.
The source ...
--------------
The source is provided, so you might have a look at different decrunch
codes. Please note: The only code I tested intensively is the byte/cache
decrunch code, which was used to create the library included in this
package. For it's the fasted code, you probably won't want to use another
one. However, this might help you to understand what huffmans are about.
If you have never written a huffman, have a look at it, a huffman code is
a pretty good idea. If you *have* written a huffman code and it's *faster*
than mine, please let me know, I will then update xpkHUFF.library.
Implementation
--------------
If you should see an errormessage saying output buffer too small while
crunching *and* encrypting, this means you tried to crunch and encrypt a
file that would crunched and encrypted be larger than the original file.
This should occur only with very small files (for I have a minimum file size
due to tables) or with files that have been crunched already and therefore
would expand during crunch.
A technical note: this could also happen, if the last chunk of a file to
be crunched/encrypted would be dimensioned too small by xpkmaster.library.
However, in this case you cannot encrypt the file. I know this could be
annoying and will think about a solution for this problem, but remember:
this encryption would not be safe, better if you used FEAL or IDEA for
secure encryption.
Statistics
----------
Following is a table briefly listing some comparative statistics for HUFF
without encryption. These were generated by xBench on the standard XPK
benchmark system (A3000/25 with SCRAM, using the AmigaVision executable as
data). Note that memory needs don't include xpkmaster.library's buffers.
Method Packing Unpacking Packing Unpacking Compression
Memory Memory Speed Speed Ratio
------ ------- --------- ------- --------- -----------
HUFF 30K 71K 88 K/s 138 K/s 24.1%
Where unpack speed varies depending on decrunch code (refer to source for
that).
Last words ...
--------------
I tried hard to debug this library with range checking while writing
bytes on crunching, and so on, but as in every code larger than, say 10
lines :-), there will be bugs. I don't know any bugs in this version, but
if you should meet one, please let me know via email (refer to end of this
document for my email adr). As usually, reproducable bugs are preferred.
Please add your configuration, programs running (best if you try without
startup-sequence!), and, most important of all, add the file you tried to
crunch! Thank you.
Version History
---------------
; V 0.1 - 12-Jul-1992 : first version
; V x.yy - 18-Jul-1992 : first OK version
; V x.yy - 19-Jul-1992 : sped up decrunching
; V x.yy - 21-Jul-1992 : bug fixed in word/long decrunching: min pack
; chunk size now 3/5
; V x.yy - 21-Jul-1992 : replaced many subq/bxx with dbf (ie sped up
; crunching a little bit), bug fixed: there was
; a dbf counter wrong (one of my favorite 0/1
; problem bugs)
; V 0.50 - 29-Jul-1992 : added 68030+ cache optimized decrunch code
; V 0.51 - 01-Aug-1992 : byte decrunch improved, first code added,
; indicator byte for crunchmethod used added,
; 68030+ chache optimized code does not make
; sense any more, since byte decrunch fits to
; cache completely, now
; V 0.52 - 01-Aug-1992 : unsafe encryption supported
; V 0.53 - 03-Aug-1992 : slight improvements made to crunch code
; (+ 6K/s)
; V 0.54 - 03-Aug-1992 : inconsistence in expansion handling fixed
; V 0.55 - 03-Aug-1992 : bug fixed: expansion handling now considers
; table creation, too
; V 0.56 - 03-Aug-1992 : bug fixed: HUFF now can crunch files
; consisting of always the same byte (shame
; on me, termination criterium was wrong)
; V 0.57 - 03-Aug-1992 : Tree creation code partially rewritten
; V 0.58 - 05-Aug-1992 : bug fixed: wrong termination criterium for
; expansion check (my favorite 0/1 problem)
; V 0.59 - 06-Aug-1992 : now decrypting in a special buffer, not using
; InBuf (this is read only, I was told) any more
; V 0.60 - 07-Aug-1992 : added extra memory required during
; packing/unpacking
; V 0.61 - 08-Aug-1992 : expansion check changed, renamed from
; xpkDHUF.library to xpkHUFF.library thus
; corresponding to the possibility of handling
; adaptive huffman codes later without having
; an inconsistence in the name
; V 0.62 - 10-Aug-1992 : Flag XPKIF_MODES removed (I don't have modes
; yet (but I have a mapping code :-=))
Contact Address
---------------
zimmerma@helios.ibr.cs.tu-bs.de (M. Zimmermann)